Potential-Based Reward Shaping for POMDPs (Extended Abstract)

نویسندگان

  • Adam Eck
  • Leen-Kiat Soh
  • Sam Devlin
  • Daniel Kudenko
چکیده

We address the problem of suboptimal behavior caused by short horizons during online POMDP planning. Our solution extends potential-based reward shaping from the related field of reinforcement learning to online POMDP planning in order to improve planning without increasing the planning horizon. In our extension, information about the quality of belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards, and thus achieve greater cumulative rewards.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Abstract MDP Reward Shaping for Multi-Agent Reinforcement Learning

MDP Reward Shaping for Multi-Agent Reinforcement Learning Kyriakos Efthymiadis, Sam Devlin and Daniel Kudenko Department of Computer Science, The University of York, UK Abstract. Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be give...

متن کامل

A comparison of plan-based and abstract MDP reward shaping

Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting away from tabula-rasa approaches many different reward shaping methods have been developed. In this paper we compare two different methods for reward shaping; plan-based, in which an agent is provided with a plan and extra rewards are given according to the steps of ...

متن کامل

Improved Planning for Infinite-Horizon Interactive POMDPs using Probabilistic Inference (Extended Abstract)

We provide the first formalization of self-interested multiagent planning using expectation-maximization (EM). Our formalization in the context of infinite-horizon and finitely-nested interactivePOMDP (I-POMDP) is distinct from EM formulations for POMDPs and other multiagent planning frameworks. Specific to I-POMDPs, we exploit the graphical model structure and present a new approach based on b...

متن کامل

Refining Diagnostic POMDPs with User Feedback

Bayesian networks have been widely used for diagnostics. These models can be extended to POMDPs to select the best action. This allows modeling partial observability due to causes and the utility of executing various tests. We describe the problem of refining diagnostic POMDPs when user feedback is available. We propose utilizing user feedback to pose constraints on the model, i.e., the transit...

متن کامل

Potential-based difference rewards for multiagent reinforcement learning

Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent’s contribution to the system’s performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-speci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013